AITopics | bayesian interpretation

Motivated by the sensitivity-based importance score of the adaptive low-rank adaptation (AdaLoRA), we utilize more theoretically supported metrics, including the signal-to-noise ratio (SNR), along with the Improved Variational Online Newton (IVON) optimizer, for adaptive parameter budget allocation. The resulting Bayesian counterpart not only has matched or surpassed the performance of using the sensitivity-based importance metric but is also a faster alternative to AdaLoRA with Adam. Our theoretical analysis reveals a significant connection between the two metrics, providing a Bayesian perspective on the efficacy of sensitivity as an importance score. Furthermore, our findings suggest that the magnitude, rather than the variance, is the primary indicator of the importance of parameters.

adalora, fine-tuning, sensitivity, (16 more...)

arXiv.org Machine Learning

2409.10673

Country:

Europe > Austria > Vienna (0.14)
Europe > Switzerland > Vaud > Lausanne (0.04)
Africa > Rwanda > Kigali > Kigali (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Putting Bayes to sleep

Neural Information Processing SystemsMar-14-2024, 07:04:54 GMT

We consider sequential prediction algorithms that are given the predictions from a set of models as inputs. If the nature of the data is changing over time in that different models predict well on different segments of the data, then adaptivity is typically achieved by mixing into the weights in each round a bit of the initial prior (kind of like a weak restart). However, what if the favored models in each segment are from a small subset, i.e. the data is likely to be predicted well by models that predicted well before? Curiously, fitting such "sparse composite models" is achieved by mixing in a bit of all the past posteriors. This self-referential updating method is rather peculiar, but it is efficient and gives superior performance on many natural data sets. Also it is important because it introduces a long-term memory: any model that has done well in the past can be recovered quickly. While Bayesian interpretations can be found for mixing in a bit of the initial prior, no Bayesian interpretation is known for mixing in past posteriors. We build atop the "specialist" framework from the online learning literature to give the Mixing Past Posteriors update a proper Bayesian foundation. We apply our method to a well-studied multitask learning problem and obtain a new intriguing efficient update that achieves a significantly better bound.

algorithm, prediction, specialist, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Utah (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Industry: Education > Educational Setting > Online (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Posterior Consistency of the Silverman g-prior in Bayesian Model Choice

Neural Information Processing SystemsApr-6-2023, 14:23:35 GMT

Kernel supervised learning methods can be unified by utilizing the tools from regularization theory. The duality between regularization and prior leads to interpreting regularization methods in terms of maximum a posteriori estimation and has motivated Bayesian interpretations of kernel methods. In this paper we pursue a Bayesian interpretation of sparsity in the kernel setting by making use of a mixture of a point-mass distribution and prior that we refer to as Silverman's g-prior.'' We provide a theoretical analysis of the posterior consistency of a Bayesian model choice procedure based on this prior. We also establish the asymptotic relationship between this procedure and the Bayesian information criterion.

Add feedback

Putting Bayes to sleep

Neural Information Processing SystemsApr-6-2023, 12:18:14 GMT

We consider sequential prediction algorithms that are given the predictions from a set of models as inputs. If the nature of the data is changing over time in that different models predict well on different segments of the data, then adaptivity is typically achieved by mixing into the weights in each round a bit of the initial prior (kind of like a weak restart). However, what if the favored models in each segment are from a small subset, i.e. the data is likely to be predicted well by models that predicted well before? Curiously, fitting such ''sparse composite models'' is achieved by mixing in a bit of all the past posteriors. This self-referential updating method is rather peculiar, but it is efficient and gives superior performance on many natural data sets.

bayesian interpretation, posterior

Neural Information Processing Systems

Industry: Education (0.39)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Posterior Consistency of the Silverman g-prior in Bayesian Model Choice

Zhang, Zhihua, Jordan, Michael I., Yeung, Dit-Yan

Neural Information Processing SystemsFeb-15-2020, 04:11:23 GMT

Kernel supervised learning methods can be unified by utilizing the tools from regularization theory. The duality between regularization and prior leads to interpreting regularization methods in terms of maximum a posteriori estimation and has motivated Bayesian interpretations of kernel methods. In this paper we pursue a Bayesian interpretation of sparsity in the kernel setting by making use of a mixture of a point-mass distribution and prior that we refer to as Silverman's g-prior.'' We provide a theoretical analysis of the posterior consistency of a Bayesian model choice procedure based on this prior. We also establish the asymptotic relationship between this procedure and the Bayesian information criterion.

Add feedback

Putting Bayes to sleep

Adamskiy, Dmitry, Warmuth, Manfred K. K., Koolen, Wouter M.

Neural Information Processing SystemsFeb-14-2020, 21:27:18 GMT

We consider sequential prediction algorithms that are given the predictions from a set of models as inputs. If the nature of the data is changing over time in that different models predict well on different segments of the data, then adaptivity is typically achieved by mixing into the weights in each round a bit of the initial prior (kind of like a weak restart). However, what if the favored models in each segment are from a small subset, i.e. the data is likely to be predicted well by models that predicted well before? Curiously, fitting such ''sparse composite models'' is achieved by mixing in a bit of all the past posteriors. This self-referential updating method is rather peculiar, but it is efficient and gives superior performance on many natural data sets.

artificial intelligence, bayesian interpretation, machine learning, (1 more...)

Neural Information Processing Systems

Industry: Education (0.39)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Bayesian interpretation of SGD as Ito process

Yokoi, Soma, Sato, Issei

arXiv.org Machine LearningNov-20-2019

The current interpretation of stochastic gradient descent (SGD) as a stochastic process lacks generality in that its numerical scheme restricts continuous-time dynamics as well as the loss function and the distribution of gradient noise. We introduce a simplified scheme with milder conditions that flexibly interprets SGD as a discrete-time approximation of an Ito process. The scheme also works as a common foundation of SGD and stochastic gradient Langevin dynamics (SGLD), providing insights into their asymptotic properties. We investigate the convergence of SGD with biased gradient in terms of the equilibrium mode and the overestimation problem of the second moment of SGLD.

artificial intelligence, bayesian interpretation, machine learning, (17 more...)

arXiv.org Machine Learning

1911.09011

Country:

Europe > Netherlands (0.04)
Europe > Denmark (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)

Add feedback

The Rational and Computational Scope of Probabilistic Rule-Based Expert Systems

Schocken, Shimon

arXiv.org Artificial IntelligenceMar-27-2013

This paper studies the underlying rationality of those languages on the syntax and calculus grounds. Some implications of the theorem to the relationship between the CF and the Bayesian languages and the Dempster-Shafer Theory of Evidence are presented. In order for a computer program to be a plausible --odel of a (mora or less) rational process of human expertise, the program should be capable of representing beliefs in a language that is (more or less) calibrated with a well-specified normative criterion, e.g. the axioms of Subjective Probability [15], the Theory of Confir.nation Tversky, the building blocks· of a probabilistic language are syntax, calculus, and semantics [18]. The-- is a set of numbers, co--only referred to as Degrees of Belief (e.g. standard probabilities or Certainty Factors), which are used to parameterize uncertain facts, inexact rules, and competing hypotheses.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1304.3105

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Texas (0.04)
North America > United States > California > Santa Clara County > Los Altos (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Putting Bayes to sleep

Adamskiy, Dmitry, Warmuth, Manfred K., Koolen, Wouter M.

Neural Information Processing SystemsDec-31-2012

We consider sequential prediction algorithms that are given the predictions from a set of models as inputs. If the nature of the data is changing over time in that different models predict well on different segments of the data, then adaptivity is typically achieved by mixing into the weights in each round a bit of the initial prior (kind of like a weak restart). However, what if the favored models in each segment are from a small subset, i.e. the data is likely to be predicted well by models that predicted well before? Curiously, fitting such ''sparse composite models'' is achieved by mixing in a bit of all the past posteriors. This self-referential updating method is rather peculiar, but it is efficient and gives superior performance on many natural data sets. Also it is important because it introduces a long-term memory: any model that has done well in the past can be recovered quickly. While Bayesian interpretations can be found for mixing in a bit of the initial prior, no Bayesian interpretation is known for mixing in past posteriors. We build atop the ''specialist'' framework from the online learning literature to give the Mixing Past Posteriors update a proper Bayesian foundation. We apply our method to a well-studied multitask learning problem and obtain a new intriguing efficient update that achieves a significantly better bound.

artificial intelligence, machine learning, specialist, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Utah (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Industry: Education > Educational Setting > Online (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Conditional Models on the Ranking Poset

Lebanon, Guy, Lafferty, John D.

Neural Information Processing SystemsDec-31-2003

A distance-based conditional model on the ranking poset is presented for use in classification and ranking. The model is an extension of the Mallows model, and generalizes the classifier combination methods used by several ensemble learning algorithms, including error correcting output codes, discrete AdaBoost, logistic regression and cranking. The algebraic structure of the ranking poset leads to a simple Bayesian interpretation of the conditional model and its special cases. In addition to a unifying view, the framework suggests a probabilistic interpretation for error correcting output codes and an extension beyond the binary coding scheme.

artificial intelligence, machine learning, poset, (16 more...)

Neural Information Processing Systems

Country: